segment model
Estimating regression errors without ground truth values
Tiittanen, Henri, Oikarinen, Emilia, Henelius, Andreas, Puolamäki, Kai
Regression analysis is a standard supervised machine learning method used to model an outcome variable in terms of a set of predictor variables. In most real-world applications we do not know the true value of the outcome variable being predicted outside the training data, i.e., the ground truth is unknown. It is hence not straightforward to directly observe when the estimate from a model potentially is wrong, due to phenomena such as overfitting and concept drift. In this paper we present an efficient framework for estimating the generalization error of regression functions, applicable to any family of regression functions when the ground truth is unknown. We present a theoretical derivation of the framework and empirically evaluate its strengths and limitations. We find that it performs robustly and is useful for detecting concept drift in datasets in several real-world domains.
- North America > United States (0.28)
- Europe > Austria > Vienna (0.14)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- Research Report > New Finding (0.48)
- Research Report > Experimental Study (0.48)
- Government > Regional Government (0.68)
- Transportation > Air (0.46)
- Consumer Products & Services > Travel (0.46)
Efficient Bayesian analysis of multiple changepoint models with dependence across segments
We consider Bayesian analysis of a class of multiple changepoint models. While there are a variety of efficient ways to analyse these models if the parameters associated with each segment are independent, there are few general approaches for models where the parameters are dependent. Under the assumption that the dependence is Markov, we propose an efficient online algorithm for sampling from an approximation to the posterior distribution of the number and position of the changepoints. In a simulation study, we show that the approximation introduced is negligible. We illustrate the power of our approach through fitting piecewise polynomial models to data, under a model which allows for either continuity or discontinuity of the underlying curve at each changepoint. This method is competitive with, or out-performs, other methods for inferring curves from noisy data; and uniquely it allows for inference of the locations of discontinuities in the underlying curve.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
- Information Technology > Modeling & Simulation (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)